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ABSTRACT 

This paper focuses on the analysis of real-time non preemptive mul- 
tiprocessor scheduling with precedence and several latency con- 
straints. It aims to specify a schedulability condition which enables 
a designer to check a priori -without executing or simulating- if 
its scheduling of tasks will hold the precedence between tasks as 
well as several latency constraints imposed on determined pairs of 
tasks. It is shown that the required analysis is closely linked to the 
topological structure of the application graph. More precisely, it 
depends on the configuration of tasks paths subject to latency con- 
straints. As a result of the study, a sufficient schedulability con- 
dition is introduced for precedences and latency constraints in the 
hardest configuration in term of complexity with an optimal num- 
ber of processors in term of applications parallelism. In addition, 
the proposed conditions provides a practical lower bounds for gen- 
eral cases. Performances results and comparisons with an optimal 
approach demonstrate the effectiveness of the proposed approach. 

General Terms: 

Distributed Systems, Real-Time Operatin Systems 

Keywords: 

Real-Time Systems, Multiprocessor Scheduling, Schedulability 
Analysis, Combinatorial Problems, Latency Constraints 

1. INTRODUCTION 

Nowadays, computer applications in which computation must sat- 
isfy stringent timing constraints are widespread. In such applica- 
tions, failure to meet the specified deadlines can lead to a serious 
degradation of the system, and can also result in catastrophic loss 
of life or property. The increasing of computing requirements leads 
to the distribution of real-time applications over multi-core plat- 
forms. However, in addition to the complexity of parallelizing such 
applications, system designers are faced to the problem of how to 
deal with applications parameters in such a way that their tempo- 
ral constraints are met. Yet, the formalization of the performance 
of parallelisable applications date to year 1967 with the Amdahl 
law 1 1 1 and which was followed by a large number of works one of 
them is in |2l. 



The challenge is to ensure that the real-time requirements of dis- 
tributed applications are satisfied by providing formal methods. In 
order to schedule, a scheduling algorithm is required which in- 
cludes a set of rules defining the execution of tasks at the system 
runtime. At the same time, it is important to provide a schedulabil- 
ity analysis, which determines, whether a set of tasks with param- 
eters describing their temporal behavior will meet their temporal 
constraints. The result of such a test is typically a yes or a no. This 
answer indicates whether, the constraints will be satisfied or not. 
These schemes and tests demand precise assumptions about task 
properties, which hold for the entire system lifetime. In addition, 
a set of processors are available for executing a set of distributed 
real-time applications or software. Each computing element might 
be a processor in a multi-processor architecture, a host or a core in 
a multi-core machine. Without loss of generality, the term 'proces- 
sor' is used in the present paper instead of the other ones. 
In this paper, a theoretical study is performed for solving the prob- 
lem of analyzing a system of real-time tasks under precedence and 
several latency constraints. Latency constraints addressed in this 
work are that imposed by the system designer between predefined 
pairs among tasks of the application graph. Latency constraints 
analysis can be used to test, both at design time and for on-line exe- 
cution, whether the time lapses between tasks pairs executions does 
not exceed an already specified values and, so, meet their deadlines. 
It constitutes a serious alternative to extensive testing and simula- 
tion by providing analytical latency bounds which contribute con- 
siderably in process monitoring and control applications required 
by real-time performance guarantees. 

As it is mentioned previously, the paper is interested in non- 
preemptive scheduling. This choice is motivated by a variety of 
reasons including [3J: 

— In many practical real-time scheduling problems such as I/O 
scheduling, properties of device hardware and software either 
make preemption impossible or prohibitively expensive. The 
preemption cost is either not taken into account or still not re- 
ally controlled; 

— Non-preemptive scheduling algorithms are easier to implement 
than preemptive algorithms, and can exhibit dramatically lower 
overhead at runtime; 

— The overhead of preemptive algorithms is more difficult to 
characterize and predict than that of non-preemptive algorithms. 
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Since scheduling overhead is often ignored in scheduling mod- 
els, an implementation of a non-preemptive scheduler will be 
closer to the formal model than an implementation of a preemp- 
tive scheduler. 



For these reasons, designers often use non-preemptive approaches, 
even though elegant theoretical results on preemptive approaches 
do not extend easily to them |4 |. Designers also choose directed 
acyclic graphs (DAG) to model different kinds of structures in 
mathematics and computer science. Indeed, in many real time sys- 
tems, applications are developed using DAGs |5] where vertices 
represent sequential code segments and edges represent precedence 
constraints. Throughout the paper, it is explained that the latency 
constraint is strongly linked to the topology of the applications 
graph or more accurately to the parts of the graph concerned by 
latency constraints. 

There is a large literature in the real-time community on schedul- 
ing tasks on multi-processor architectures. Sporadic and aperiodic 
real-time tasks are considered in respectively 1 6 1 and 1 7 1 whereas 
energy-efficient scheduling is proposed in 1 8 1 . In 1 9 1 QoS manage- 
ment is proposed and |10 | targets to minimize either the overall 
bandwidth consumption or the required number of cores. However, 
to our knowledge, schedulability analysis dealing with several la- 
tency constraints (as it is defined in this paper) has not been consid- 
ered. In fact, Among the constraints addressed in real-time schedul- 
ing issues, latency constraints are less studied comparing with the 
periodicity constraint for example fTTl . Nevertheless, latency is a 
major concern in several fields such as in embedded signal process- 
ing applications 1 12] for example. In the literature, most often, au- 
thors talk about an end-to-end deadline which ensures that the time 
lapse from sensors and actuators does not exceed a certain value 
1131 . The main differences between latency and end-to-end dead- 
line is that latency constraints are as much as system designer wants 
meaning that they can be imposed between any pair of connected 
tasks in the system (not necessarily sensor and actuator tasks only). 
In |14| , a definition of this constraint is given and the existence of 
a link between deadlines and latency is proven. In addition, dis- 
tributed architectures involve inter-processor communications the 
cost of which must be taken into account accurately. Furthermore, 
concerning synchronization cost reduction, the approach proposed 
in 1 15 1 is efficient in term of finding a minimal set of interprocessor 
synchronization, however, this approach assumes that some depen- 
dence can be removed even though data are exchanged. Moreover, 
it is not suitable for latency constraints satisfaction because it im- 
poses a tasks scheduling not exploiting the potential tasks paral- 
lelism which is essential in minimizing their total execution time. 
Moreover, it was not possible to exploit results from parallelism 
community, essentially because of precedence constraints which 
are not taken into account 1161 . 

The main contributions of this paper are the proposition of a 
schedulability conditions for latency constraints in the hardest con- 
figuration with an optimal number of processors in terms of appli- 
cation parallelism. This configuration stands for the hardest config- 
uration among the other possible configurations because of the in- 
terdependence of latency constraints. Also, from these conditions, 
practical lower bounds for latency constraints values were deduced, 
the efficiency and the rapidity of which were showed by evaluation 
tests. 

The paper is organized as follows: Section 2 introduces the model 
and defines the latency constraint. Section 3 introduces the schedu- 
lability analysis through the different possible cases. Section 4 de- 
scribes the performance evaluation. 
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2. DEFINITIONS AND MODEL 

The paper deals with systems of real-time tasks with precedence 
and several latency constraints. A task t t is characterized by a worst 
case execution time (WCET) C(ti) £ N. The precedences between 
tasks are represented by a directed acyclic graph (DAG) denoted Q 
such that Q = (V, E). V is the set of tasks characterized as above, 
and E C V x V the set of edges which represent the precedence (de- 
pendence) constraints between tasks. Therefore, the directed pair of 
tasks (t a ,t b ) £ E means that t b must be scheduled, only if t a was 
already scheduled and t a is called a predecessor of t b . The set of 
tasks belonging to all paths from t a to t b including t a and t b is de- 
noted by V. Note that the architecture plate-form is composed of 
identical processors. 

A communication cost is involved when dependent tasks are sched- 
uled on two processors, whereas, the communication cost is consid- 
ered to be negligible if dependent tasks are scheduled on the same 
processor. In our study the overall communication overhead in- 
volved by the interaction between processors is taken into account. 
If Ai is the function of time needed for communication then Ai can 
vary linearly with the number of processors: Ai(m) — Q.(m — 1) 
where Q is a constant dependent on the architecture and stands for 
an average communication cost between a pair of processors and 
m is the number of processors. In addition, Ai can, also, vary loga- 
rithmically since communications can be designed in order to get a 
logarithmic impact on the total execution time. For example, com- 
munications can be parallelized in the case of hierarchical topol- 
ogy architectures and function Ai becomes Ai(m) = Q.logm. 
Nevertheless, it is important to notice that in targeted applications, 
granularity is chosen in such a way to get high computation to 
communication ratio. Because, when the granularity is large the 
computation cost becomes dominant and the relatively small (but 
non-negligible) communication cost actually encourages the use of 
more processors to help the reduction of scheduling time. This im- 
plies more opportunity for performance increase but, nevertheless, 
involves hard efficient load balancing 1171 . 

Each task t 4 has a start time S(ti) determined by the scheduling 
algorithm. A latency constraint is defined only between two tasks 
connected in the tasks graph which means that it exists at least one 
path connecting the two tasks. By imposing a latency constraint 
L(t a ,tb), the time elapsing from the execution start of t a and the 
execution start of t b must be less or equal than an integer denoted 
also by L(t a ,t b ) and which is already known. As in the graph tasks 
t a and t b are connected by one or several paths, hence, V(t a , tb) 
denotes the set of paths Pi which connect t a to t b . Hence, V(t a ,t b ) 
is also a set of sets of tasks meaning that £ (pj £ V(t a ,t b )). 
The length of p 4 is denoted by \pi\ such that \pi\ = 2~2t ep- ^(%)- 
Among paths p t , Ip denotes the longest one. 
More formally, a latency constraint L(t a ,t b ) is met if and only if: 

S(t b ) - S(t a ) < L (1) 

In the tasks graph of the figure[T];P(ti , t 7 )={pi , £>2 , J>3 , Pi , Ps , Pn } 
such that: p x = {tu t 2 , t 3 , t 4 , t 5 , t 6 , t 7 }, p 2 = {t x , t 8 , t 9 , t 4 , t 5 , t 6 , 

tr}, P3={A, ^2, £3, ti, ts, t±o, t 7 }, P4= {tu tg, t<j, £4, t 5 , tio, t-{\, 

P5={ti, t 2 , tn, ti, h}, Pe={ti, t 2 , tu, t 4 , t 5 , t 10 , t 7 } and 

P7={tl, t 2 , til, t 6 , t 7 }. 

3. SCHEDULABILITY STUDY 

The studied problem is close to the problem "P | prec | C max " (us- 
ing Lenstra's 3-fields notation 1 18 1) which is known to be NP-hard 
1181 . The "P I prec | Cmax" problem aims to minimize maximum 
completion time of all tasks whereas the objective is to determine 
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Fig. 1 : Tasks under latency constraint 
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Algorithm 1 Allocation Algorithm 

1: m 4- 

2: Sort paths in V in a decreasing order of length 
3: Select Ip and initialize a set of tasks $ = Ip 
4: while $ / V do 

5: For each path p; not already selected : 

A(Pi) = E cfe) 

6: Select such that X(pt) = max(A) 
7: $ = $ U pi (include p^'s tasks in <£>) 
8: m <— m + 1 

9: end while 



the schedulability of the graph tasks by findingl whether a schedul- 
ing of all tasks of the graph on a multiprocessor platform, satisfying 
the precedence and latency constraints, exists or not. Consequently, 
our problem in a one latency case is also NP-hard. Moreover, in the 
several latency constraints case, the problem becomes NP-hard in 
the strong sens because of links between latency constraints . 
Since the studied problem is NP hard, no algorithm can resolve it 
in a polynomial time (unless NP=P) and this is, also, true for the 
schedulability condition. This means that, in a general case, it is 
impossible to propose a necessary and sufficient condition allowing 
to check if a set of tasks under a latency constraint is schedulable 
or not in a polynomial time. 

3.1 One latency Constraint Case 

The matter of dealing with a latency constraint is closely linked 
to the structure of the graph. That is the reason why a partitioning 
method is proposed considering graph paths. Without loss of gen- 
erality, in the present paper it is considered that the whole graph 
is under the latency constraint L(t a ,t b ) which means that the con- 
sidered graph has one root vertice t a and one leaf vertice % (see 
figure[TJ. In the case of graphs with large tasks and edges numbers, 
the number of paths is also very large. However, determining all 
paths is not an NP hard problem 1 19 1. Besides, according to [20|, 
it exists several approaches for determining all paths of a graph, 
among which the topological sort of the graph can be mentioned. 
However, in practice, the number of paths is less than the number 
of vertices in a graph. Even in a simple design with a small quantity 
of components, the number of vertices in Q is more than 10 times 
the number of paths in the architecture |21 1. 

The allocation algorithm (Algorithm [TJ has as inputs all paths of 
the graph and as outputs the selection of some of them which, each 
one, will be associated to a distinct processor. First, the algorithm 
begins by sorting paths in V(t a ,t b ) according to a decreasing or- 
der of their lengths then it selects them one by one and it allocates 
paths tasks to a processor to which it is associated. After that, at 
each step, tasks belonging to a path pt and which were not allo- 
cated before via another path (the case of tasks belonging to several 
paths) will be allocated to the processor to which pi is associated. 
The algorithm stops when all tasks under a latency constraint are 
allocated meaning that all paths will not be necessarily selected. 
As a result, each task of the application graph will be allocated to 
only one processor. Also, an integer m is returned equivalent to 
the number of selected paths which returns the number of required 
processors. In other words, Algorithm [TJ parallelizes the execution 
of the application by allocating its tasks to a set of processors. Be- 
sides, this parallelization follows the configuration of paths which 
compose the application graph. 
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Fig. 2: Paths Allocation 

An example of Algorithm [TJ application is illustrated in figure [2] 
Processors Pi , P 2 and P 3 were required whereas seven paths were 
detected (see example of section [2]). For this example it is as- 
sumed that the execution times of tasks are equal. From now the 
set of paths V(t a ,tb) is considered composed of m paths (the 
ones selected by Algorithm [TJ. Also, we notice by pi the set of 
tasks exclusively belonging to pi, more formally, if ti £ pi then 
Wpj £ V{t a ,t b ) \pi,U i p 3 . 

One can ask what makes the number of tasks returned by Algo- 
rithm[TJso distinctive. The answer is that the value of m represents 
the optimal number of processors since it allows to exploit the total 
parallelism inherent to the application graph. This means that if two 
tasks are not linked by a path in the graph (no one is the predeces- 
sor or the successor of the other) then they are allocated to distinct 
processors. Moreover, Adding other processors than the m proces- 
sors required by Algorithm [TJ does not improve the exploitation of 
the parallelism inherent to the application graph. Proposition [TJ in- 
troduces the optimality of m. 

PROPOSITION 1. The application of Algorithm^on an appli- 
cation graph returns the optimal number of processors allowing the 
task parallelism exploitation. 

Proof Algorithm[TJallocates tasks according to paths to which they 
belong. Notice that the considered paths are those which include, 
at least, a task which does not belong to any other path. Let assume 
that for a given graph G algorithm [TJ returned m processors. Also, 
let assume that, it exists a number of processors m' such that m' < 
m for which the exploitation of the parallelism of the graph G is 
optimal. This means that each pair of tasks (t»,tj) not linked by 
a path in G are allocated to two distinct processors among the m' 
processors. As assessed earlier, the graph G has only one root task 
t a and only one leaf task t b and, hence, it exist two distinct paths 
which link t a and t b and include t * for the first and tj for the second 
(This is due to the fact that t t and tj are not linked). This implies 
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that all distinct paths in G will be concerned. Consequently, (m — 
m') processors are missing in order to parallelize all pairs (ti, tj) 
□ 

From now on, Algorithm [T] is systematically applied to allocate 
tasks. The following proposition introduces a necessary and suffi- 
cient schedulability condition in the case of one latency constraint. 

PROPOSITION 2. Let L be a latency constraint imposed on the 
tasks pair (t a , Latency constraint L(t a , t b ) is met if and only 

if: V Pj eV(t a ,t b ), 

V C{U) +max(y; C(ti))+M(m) <L (2) 

tie flPj tjEpj 

Proof this result is quite intuitive and can be obtained by examining 
the inequality S(t b ) — S(t a ) < L. Indeed S(t b ) — S(t a ) which is 
the scheduling time of tasks under latency constraint L is equal to 
the sum of execution times of: 

(1) Tasks which are non-parallelisable with any other tasks (se- 
quential tasks which are linked by a path in the application 
graph). These are represented by tasks shared between all paths 
inV(t a ,t b ) (U £ {f]Pj}X 

(2) Among parallel tasks, the longest sub-path is selected from the 
m paths. On each processor rrii tasks of the set P t are allocated 
and the largest sum of executions time of tasks of each P t is 
kept. This is due to the precedence between tasks which pre- 
vents of distributing parallel tasks between processors to get a 

more balanced distribution such as — (V' is the set of 

m 

tasks which are in parallel in the graph application), 

(3) Communication overheadQ 

3.2 Several Latency Constraints Case 

In 1 22 1, authors have stated that all possible combinations for two 
pairs of tasks under, each one, a latency constraint can be covered 
by three cases: 

— In parallel, when there is no path linking tasks under the first 
latency constraint to those under the second latency constraint. 

— In Z, when there is one (or more) path linking tasks under the 
first latency to those under the second latency or vice versa. 

— In X, there is one (or more) path linking tasks under the first 
(resp. second) latency to those under the second (resp. first) la- 
tency. 

For the Z and parallel relations the schedulability study can be per- 
formed as for the one latency case. This statement issues from the 
fact that latency constraints in these cases can be addressed one af- 
ter the other in order to check the schedulability of the whole sys- 
tem. In addition, the X configuration is the hardest one to be stud- 
ied because the two latency constraints are dependent. In fact, satis- 
fying one of these latencies is not related to the scheduling of tasks 
under this constraint only but it is related, also, to some tasks which 
are under other latency constraints. Usually, in this case, it is about 
multi-objective optimization and the problem becomes harder than 
in a single optimization case 1 23 1 . 

Let's take an example of a tasks graph subject to a pair of latency 

constraints in X. The figure [3] depicts a pair of latency constraints 

L\ and L 2 in X imposed between (t 1 , £4) and (tg,tn). 

The following proposition introduces a necessary and sufficient 

schedulability condition in the case of two latency constraints in 

X. 
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Fig. 3: A pair of latency constraints in X 

PROPOSITION 3 . Let (L 1 , L 2 ) be two latency constraints in X 
imposed, respectively, on tasks pairs (t a ,t b ) and (t c ,td)- Latency 
constraints L\(t a , t b ) and L 2 (t c , td) are met if and only if: 

(1) Condition of proposition^is met for tasks under Li and m 1 
processors and for tasks under L 2 and m 2 processors 

(2) and 

{max \pi\ + M(m) < L 1 
Pi ev(t c ,t b ) m 
max \ Pl \+M(m)<L 2 w 

m, m! and m 2 are obtained by applying Algorithm[T]on the graph 
under latency constraints L\ and L 2 . mi is the number of proces- 
sors to which tasks under L x are allocated, m 2 the number of ones 
to which tasks under L 2 are allocated and m represents all required 
processors. Notice that m < nix + m 2 because there exist tasks 
under the two latency constraints. 

Proof As expected, the one latency case schedulability condition 
(condition^ becomes a necessary condition in the case of two la- 
tency constraints in X. Indeed, if one of the two latency constraints 
is not met then all the system is considered as non-schedulable. 
Then, in order to prove the sufficiency of the condition proposed 
here, equations ^ is assumed as satisfied, and constraints L\ and 
L 2 are, nevertheless, not met. The constraints L\ and L 2 are not 
met means that S(t b ) - S(t a ) > L\ and S(t d ) - S(t c ) > L 2 . 

S(t b ) - S(t a ) > L 1 means that: 
Either, 

3pi <E V(t a ,t b ), \pi\ + M(mi) > L\. This hypothesis is in con- 
tradiction with the condition|2]because: 

{2} ^V Pl eV{t a ,t b ), \pi\ < Lr 

Or, 

as t c is a predecessor of the task t b , hence, the start execution of 
t b is related to the execution of t c and other tasks which are under 
the latency constraint L 2 . Therefore, in the present case, the start 
execution of t b is delayed by the execution of tasks under latency 
constraint L 2 whereas all predecessor tasks of t b under latency con- 
straint Li were executed. This is, more formally, described by the 
following inequality: 

34 e (V(t a ,t b )nv(t c ,t d )), 

max |Pi|+A^(m 2 )< max \pA -\- M{rn x ) (4) 
Furthermore, 

S(t b ) - S(t a ) > Li and {5} ^> 
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3pj e V(t cy t x ) and 3p k e V(t xy t b ), 

\Pi\ + M(m 1 ) + \p k \ + M(m 2 )>L 1 (5) 
Otherwise, it is clear that: 
\p j \+M(m 1 ) + \p k \+M(m 2 ) < max \ Pi \+M(m)(6) 

from condition[3] equation[5]is in contradiction with equation[6] 

The same reasoning can be followed to prove that (S(t c ) — S(td) > 
L 2 ) is in contradiction with the assumption that the constraint L 2 
is metQ 

The result of proposition [3js easily generalizable to a tasks graph 
subject to n latency constraints, two by two, in X configuration. 
Indeed, It suffices to check conditions of proposition[3]for each pair 
of latency in X then to conclude the schedulability of the whole 
system. So, using results of propositions [2] and [3] any application 
graph can be dealt with whatever the number ofimposed latency 
constraints is and whatever these latency constraints are configured. 

The schedulability study performed earlier introduces schedula- 
bilty conditions over a processors number which stands for the opti- 
mal number to exploit all the parallelism inherent to the application 
graph, but the proposed conditions does not fit a system with a static 
architecture (i.e., the number of processors is known beforehand 
and fixed). When system designers face such systems, they tend to- 
wards fast analysis methods even thought these methods are not as 
exact as optimal methods. So, knowing that the targeted problem is 
NP-hard in the strong sens the schedulability analysis of such sys- 
tems throughout optimal approaches or even heuristics takes a very 
long time. Instead of an optimal schedulability analysis, conditions 
the paper proposes practical lower bounds for latency constraints 
values Li whatever the number of processors is. Hence, system de- 
signers can refer to the proposed conditions to adjust the latency 
constraints values while saving a considerable time. The following 
proposition introduces lower bounds for latency constraints values 
according to the different configurations. 

PROPOSITION 4. 1. if L is a latency constraint imposed on the 
tasks pair (t a ,t b ). The lower bound of L(t a ,t b ) is: 

L lb = V C(t<) + max(V C{t % ))+M{m) (7) 

2. If (Li, L 2 ) are two latency constraints in X imposed, respec- 
tively, on tasks pairs (t a ,t b ) and (t c ,t d ). The lower bounds of 
Li{t a ,t b ) and L 2 (t c ,t d ) are: 

Pj eV(t a ,t b ) 

Lf = max( £ C(t 4 )+ max ( E C(ti)) + M{m x ), 
i,e rii>j Pj^T(t a ,t b ) t . £ — 

max \pj \ + M(m)) 

Pj eV(t c ,t b ) 

(8) 

Pj eV(t c ,t d ) 

4 b = max( £ C(ti)+ max ( £ C(t,)) + M(m 2 ), 
tie flPj Pj eP(t c ,t d ) t . e gj 

max \pj \ + M(m)) 

Pj ev(t a ,t d ) 

Proof 

L lb represents a lower bound to the scheduling time between t a and 
t b (S(t b ) — S(t a )). This means that the value that system designer 
will give to L(t a ,t b ) must not be lower than L otherwise the la- 
tency will necessarily be not met. As high computation applications 
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are targeted, the use of more processors involves the reduction of 
scheduling time. Reciprocally the reduction of the number of pro- 
cessors will increase the scheduling time. This proves that l lb in the 
different seen configurations is a minimum of scheduling time for 
systems where the number of processors is less than m. 
In addition, as m represents the optimal number of processors to 
get the optimal parallelism within the application graph, the fact of 
using more processors than m processors does not lead to reduce 
the scheduling timeQ 

3.3 Performance Evaluation 

In order to evaluate the performances of applying the schedula- 
bility condition of proposition [3] we implemented an application 
designated as the proposed approach which, for a given graph 
of tasks under a pair of latency constraints in X , checks condi- 
tions of proposition [3] and outputs, following the obtained result, 
the schedulability of the system. Then, two kinds of tests are per- 
formed: 

— an evaluation of time performances of the proposed solution, 

— a comparison with solutions provided by the constraint program- 
ming approach. 

Tasks graphs (DAGs) used for the evaluation were generated ran- 
domly according to the two following parameters: number of tasks 
and density. In our case the graph density is a ratio between the 
number of edges in the graph and the number of possible edges 
(in the complete graph). For example, a graph of 12 tasks with 0.5 
density has 33 edges whereas a complete graph of 12 tasks has 
66 edges. Notice that the number of edges in a complete graph is 
" 2 1 ' w h ere n is the number of tasks. 

Inside the graph, 40 % of tasks is put under the constraint L\ and 40 
% under the constraint L 2 . Next, the remaining 20% are put under 
the two constraints L\ and L 2 . An example of a generated graph 
with 12 tasks and 0.25 of density (17 edges ) is given in figure|4] 
5 tasks are exclusively under the constraint L\, 5 other tasks are 
exclusively under L 2 and 2 tasks are under both of L 1 and L 2 . 
In the generated graph the number of edges is determined by the 
density (as explained in the previous paragraph) whereas the con- 
figuration of these edges is defined randomly as follows: 




Fig. 4: Example of generated 12 tasks graph 
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— a set of randomly generated edges within the restriction of en- 
suring the X configuration of latency constraints (the edges in 
continued line in the graph of figure]?}, 

— a set of randomly generated edges between tasks under the same 
latency constraints (the edges in discontinued line in the graph of 
figure]?} and which satisfy the DAG properties of the graph. 

The first test concerns time performances of the proposed approach 
functions of the graph's number of tasks and the graph's density. 
The diagram of figure [5] depicts the evolution of the runtime by a 
3d curve. It showed that the increasing density has a more impor- 
tant impact, than those of the number of graph tasks, on the runtime 
of the proposed approach. This is mainly explained by the fact that 
the number of paths increases when the graph has a higher density. 
Moreover, the runtime of the proposed approach are very reason- 
able even when the density is hight. Notice that the runtime follows 
a logarithmic scale and results were collected on a machine with a 
3,4 GHz Intel Core i7 processor and 10GB main memory. 
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Table 1 . : Definition of Variables and Domains 



Variable 


Domain 


NbTasks 


N+ 


NbProcs 


N+ 


duration(i;) 


N+ 


task(tj, procj) 


[StartOf(ti), EndOf (tj)] C N+ 



Table 2. : Definition of Constraints 



Constraint 


Description 


•if(*i,tj)eE 
then EndOfftj) < StartOf(ij) 
■Vtj e V, Vprocy, 
alternative(task(ti , procj)) 
■ Vprocj, noOverlap(pr oci) 


■ tj is a predecessor of t{ 

■ each task needs only one 
processor to be executed 

■ no overlap on processors 



Runtim io 
(seconds) 




Graph Density 



200 



400 



Tasks 
Number 



Fig. 5: Proposed approach runtime evolution 



The second test targets the efficiency of the proposed approach in 
term of schedulability and lower bounds. To do so, we chose to use 
the constraint programming for resolving the latency constraints 
scheduling problem and to compare the obtained results to the pro- 
posed approach results. 

The constraint programming is a programming language that is ori- 
ented to relationships or constraints among entities |24|. The most 
important reason is that constraint programming has a rich mod- 
eling language which is very convenient to express the problem. 
Moreover, the underlying CP solver is relatively robust with respect 
to the addition of new constraints, and the search can be controlled 
entirely by the user. 

Our problem was solved using ILOG OPL Studio commercial soft- 
ware according to the following CP formulation. The objective is to 
minimize the scheduling of tasks under L\ by minimizing the start 
time of t b and in the same time minimizing the scheduling of tasks 



under L 2 by minimizing the start time of td (knowing that latency 
constraint are imposed on (t a ,t b ) and (t c ,t d )). Hence, the multiple 
objectives are expressed in a single objective by summing them to- 
gether and applying weights to each objective to signify its relative 
importance. It was assessed, first, that the two objectives have the 
same importance and, consequently, the same weight. But, the run- 
time of CP approach exploded, even for small graphs. Hence, CP 
approach minimizes L 1 first then L 2 . Thus, the objective function 
is: 

Min (x * StartOffo) + y * StartOf (t d )) 

Where (x, y) = (1, 0) then (x, y) — (0, 1). In addition, variables 
domains and constraints are given in table [T] and [2] Constraints of 
tableware provided by ILOG OPL Studio for scheduling modeling 
1251 . The number of processors is defined by Algorithm]!] 
To do so, within the CP approach the objective was to look for the 
scheduling which minimizes the start dates of t b and t d then to 
compute the values of L° pt = StartOf(tt) and L2 Pt =StartOf(td). 
These values are the optimal (smallest) values that L x and L 2 can 
have. Then, they were compared to the values of L l ± and L\ h re- 
sulting from the calculation of equations[8] 

After that, the value of p is computed which the ratio between L° pt 

L opt 

and such that p(Li) — in order to get an idea of how far 

are the proposed approach results from the optimal ones. For each 
case of the tasks number list [12,14,16] until 20 different graphs 
were generated and both approaches were applied on them. No- 
tice that the chosen density of all tested graphs was 0.4. At the 
beginning, the two approaches were executed on a m processors 
architecture (m is given by Algorithm]]}. After that, the number of 
processor was reduced and fixed from the list [4,3,2], and only the 
optimal approach was executed. Notice that the proposed approach 
cannot be execute since it fixes itself the number of processors. Re- 
sults are illustrated by diagrams on figure |6](p(L 1 ) is marked in 
black and p(Lq) is in white). 

As expected, p is equal to 1 when the number of processors is equal 
to m meaning that our approach as the optimal approach return the 
optimal latency values in the case of m processors. After that, once 
the number of processors was reduced, p values increase meaning 
that the values returned by CP is larger than L\ b and L lb which 
confirm their positions of lower bounds. Notice that the values of 
p increase, also, from the first set of tests (12 tasks) to the second 
set of tests (14 tasks) then increase again in the third set of tests (16 
tasks). This is explained by the fact that when the number of tasks 
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Fig. 6: Proposed approach schedulability performances 



increase, the number of paths follows and it leads to increase the 
value of the optimal number of processors m. 
In addition, it emerges that the proposed approach provides an in- 
teresting results considering that, among the three sets of tests, 
optimal approach returned results varying from 1.25 and 1.5 
times the proposed approach results. This means that, on all per- 
formed experiments, the proposed approach gives a value to L t 
which is, at worst, around 1.5 times smaller than the one given by 
the CP approach. Hence, the proposed lower bounds can be con- 
sidered as efficient seeing the difference between runtimes of the 
two approaches. The light difference between and p(L 2 ) 

is explained by the fact that in CP approach priority is given to 
the minimization of (StartOf(t 6 )) at the cost of minimization of 
(StartOf(td))- As with any other optimal method, runtime of CP 
approach explodes exponentially as soon as the number of tasks 
becomes more important which prevented us to consider more than 
16 tasks graphs. 

4. CONCLUSION 

The paper presents a theoretical study of the real-time non pre- 
emptive multiprocessor scheduling with precedence and several la- 
tency constraints. After assessing the NP-hardness of this problem, 
an algorithm is proposed for allocating application graph tasks to 
a number of processors allowing the optimal task parallelism ex- 
ploitation. The schedulability study, proposed here, introduces a 
first condition in the case of one latency constraint. Then, after 
giving the different possible configurations in the case of several 
latency constraints, it introduces a second condition to check the 
schedulability of latency constraints in the hardest configuration in 
term of complexity. Finally, from the proposed conditions a practi- 
cal lower bounds were deduced. 

The first phase of tests demonstrates that the proposed approach 
has a very competitive runtime. In addition, the second phase con- 
cerned a comparison with an optimal approach which is the Con- 
straint Programming approach. These tests showed that the pro- 
posed approach provides an interesting results in term of schedula- 
bility and lower bounds. 
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The performed study assumes that the number of processors is at 
least equal to the number of paths selected by the allocation al- 
gorithm. Hence, it is plan to explore the possibilities of including 
the number of processors in the schedulability condition as a fixed 
parameter. 
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